Speech Recognition Only with Supra - segmental Features — Hearing Speech as Music —
نویسندگان
چکیده
This paper proposes a novel paradigm of speech recognition where only the supra-segmental features are utilized. Absolute properties of speech events such as formants and spectrums are completely discarded and only the relative and differential properties of the events are extracted as phonic contrasts. The phonic contrasts are considered as supra-segmental features and they are mathematically shown not to carry non-linguistic features such as speaker, age, gender, etc. This fact leads us to expect that speaker-independent speech recognition should be possible with the reference models built only with a single speaker’s speech. Experiments of isolated vowel sequence recognition show that this expectation is correct and that the performance of the new paradigm is better than that of the conventional one using more than four thousand speakers, even in the case of noisy speech. Hearing sounds through capturing only their contrasts and their structure is often done when hearing musical sounds, indicating that the proposed paradigm hears speech as music.
منابع مشابه
The Effect of Using PRAAT Software on Pre-Intermediate EFL Learners’ Supra Segmental Features
The present study investigated the effect of using PRAAT as a free computer software package for the scientific analysis of speech in phonetics on pre-intermediate Iranian English as foreign language (EFL) learners’ supra segmental features (i.e., intonation and stress). The design of the study was a Quasi-experimental research design with a pre and post-test. In doing so...
متن کاملAnalysis and Modelling of Emotional Speech in Spanish
The importance of speech prosody for conveying emotional information has been extensively underlined in the literature. Major elements such as pitch, tempo and stress are presented as the main acoustic correlates of emotion in human speech. Nevertheless, as several authors have shown, voice quality is also a relevant feature in emotion recognition. In this paper, we present the prosodic analysi...
متن کاملSyllabic Pitch Tuning for Neutral-to-emotional Voice Conversion
Prosody plays an important role in neutral-to-emotional voice conversion. Prosodic features like pitch are usually estimated and altered at a segmental level based on short windowing of speech signal (where the signal is expected to be quasi-stationary). This results in a frame-wise change of acoustical parameters for synthesizing emotionalized speech. In order to convert a neutral speech to an...
متن کاملMusic Training Program: A Method Based on Language Development and Principles of Neuroscience to Optimize Speech and Language Skills in Hearing-Impaired Children
Introduction: In recent years, music has been employed in many intervention and rehabilitation program to enhance cognitive abilities in patients. Numerous researches show that music therapy can help improving language skills in patients including hearing impaired. In this study, a new method of music training is introduced based on principles of neuroscience and capabilities of Persian languag...
متن کاملAutomatic Segmentation of Continuous Speech on Word Level Based on Supra-segmental Features
This article presents a cross-lingual study for Hungarian and Finnish about the segmentation of continuous speech on word and phrasal level by examination of supra-segmental parameters. A word level segmentationer has been developed which can indicate the word boundaries with acceptable precision for both languages. The ultimate aim is to increase the robustness of speech recognition on the lan...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006